Calculate distances using address and Bing Map

by Zongyan Wang


Posted on July 26, 2018


Imagine you want to compare the following distance:
1. Times Square, New York to Disney Resorts, Orlando
2. You and the girl/boy you are chasing for.
The following article can help you to calculate the first distance by Python.

What you need:
Python --3.7
geocoder --1.38.1
pandas --0.23.3

How can I check my python packages version?
For example if you want to find your pandas version. Open a python interface and do.

>>> import pandas 
>>> pandas.__version__ 
'0.23.3'
                      

Now here is your data.

df = pd.DataFrame({'A_address': ['Times Square',], 
                      'A_city': ['Manhattan', ], 
                      'A_state': ['NY', ], 
                      'B_address': ['Walt Disney World Resort', ], 
                      'B_city': ['Orlando',], 
                      'B_state':['FL']}, index = range(1))
                        
                      

What we will do the next is:
1. Get the lat, lng information for A and B. 2. Calculate the distance with the lat, lng. To get the lat, lng. First we need to concatenate the address into full address. Use:

df['A_full_address'] = ["%s, %s %s"%(addr, city, state) for addr, city, state in zip(df.A_address, df.A_city, df.A_state)] 
df['B_full_address'] = ["%s, %s %s"%(addr, city, state) for addr, city, state in zip(df.B_address, df.B_city, df.B_state)]

Then, calculate the lattitude and longitude using geocoder.
To run the following code, you need to get your bing map key first. And be careful, don't waste your money on duplicates. Check your number of unique addresses first if you have multiple rows.
# if you have duplicates
# df_a = df.loc[:, [x for x in df.columns if x[0] == 'A']]
## de-duplicate
# df_a = df_a.groupby('A_full_address').first().reset_index(drop=False)
for i in df.index: 
    g = geocoder.bing(df.loc[i, 'A_full_address'], key = bing_key) 
    df.loc[i, 'A_lng'] = g.lng 
    df.loc[i, 'A_lat'] = g.lat
                        
                      

Or you may use pandas apply method.
     
def get_loc(obj): 
    g = geocoder.bing(obj, key = bing_key) 
    return g.lng, g.lat 
df[['B_lng', 'B_lat']] = df.apply(lambda obj: pd.Series(
      dict(zip(['B_lng', 'B_lat'],
      get_loc(obj['B_full_address'])))), axis = 1)
                          
                      

The haversine function is credit to https://stackoverflow.com/questions/4913349/haversine-formula-in-python-bearing-and-distance-between-two-gps-points. Here is the function:

  
from math import radians, cos, sin, asin, sqrt   
def haversine(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points 
    on the earth (specified in decimal degrees)
    """
    # convert decimal degrees to radians 
    lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
    # haversine formula 
    dlon = lon2 - lon1 
    dlat = lat2 - lat1 
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
    c = 2 * asin(sqrt(a)) 
    km = 6367 * c
    mile = km * 0.621371
    return "%.2f"%mile
                          
                      

And you could finally calculate the distance using haversine function defined above.

 
df['distance_in_miles'] = df.apply(lambda obj: haversine(*obj[['A_lng', 'A_lat', 'B_lng', 'B_lat']]), axis = 1)
                  

Your final output will look something like this

A_address A_city A_state B_address B_city B_state A_full_address B_full_address A_lng A_lat B_lng B_lat distance_in_miles
0 Times Square Manhattan NY Walt Disney World Resort Orlando FL Times Square, Manhattan NY Walt Disney World Resort, Orlando FL -73.966248 40.783436 -81.582626 28.403811 957.21